Finding Semantically Related Words in Large Corpora
نویسندگان
چکیده
The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and describe the treatment of collocations. Next we present the procedure of hierarchical clustering of a very large number of semantically related words and give examples of the resulting partitioning of data in the form of dendrogram. Finally we show a form of the output presentation that facilitates the inspection of the resulting word clusters.
منابع مشابه
Investigation of Word Senses over Time Using Linguistic Corpora
Word sense induction is an important method to identify possible meanings of words. Word co-occurrences can group word contexts into semantically related topics. Besides the pure words, temporal information provide another dimension to further investigate the development of the word meanings over time. Large digital corpora of written language, such as those that are held by the CLARIN-D center...
متن کاملFinding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity
There have been many proposals to extract semantically related words using measures of distributional similarity, but these typically are not able to distinguish between synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora consisting of documents translated into multiple lan...
متن کاملA New Measure for Extracting Semantically Related Words
The identification of semantically related terms for a given word is an important problem. A number of statistical approaches have been proposed to address this problem. Most approaches draw their statistics from a large general corpus. In this paper, we propose to use specialized corpora which focus strongly on the individual words of interest. We propose to collect such corpora through target...
متن کاملHow textbooks (and learners) get it wrong: A corpus study of modal auxiliary verbs
Many elements contribute to the relative difficulty in acquiring specific aspects of English as a foreign language (Goldschneider & DeKeyser, 2001). Modal auxiliary verbs (e.g. could, might), are examples of a structure that is difficult for many learners. Not only are they particularly complex semantically, but especially in the Malaysian context ...
متن کاملMiniCors and Cast3LB: Two Semantically Tagged Spanish Corpora
In this paper we present two Spanish corpora, MiniCors and Cast3LB, semantically tagged according to different annotation criteria and objectives. In order to guarantee the quality of the results, we have established a methodology for the development of these corpora. The resulting resources consist of a semantically tagged corpus according to the lexical sample task, and a semantically tagged ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001